Let’s create a new Rstudio project in which to work with:
In our new project, let’s create a data/ folder in which to store the data.
dir.create("data")
Let’s also open a new R script in which to work:
Save it in the project root, eg as metadata_dev.R
Let’s install and load all the packages we’ll need for the workshop:
install.packages("tidyverse")
install.packages("here")
install.packages("devtools")
devtools::install_github("ropenscilabs/dataspice")
library(tidyverse)
library(dataspice)
For more information on the data source check the tutorial README
The readr::read_csv() allows use to download raw csv data from a URL.
URL for vst_mappingandtagging.csv at bit.ly/mapping_csv
URL for vst_perplotperyear.csv at bit.ly/perplot_csv
vst_mappingandtagging <- read_csv("https://raw.githubusercontent.com/annakrystalli/dataspice-tutorial/master/data/vst_mappingandtagging.csv")
vst_perplotperyear <- read_csv("https://raw.githubusercontent.com/annakrystalli/dataspice-tutorial/master/data/vst_perplotperyear.csv")
You can inspect any object in your environment in Rstudio using function View()
vst_mappingandtagging %>% View()
vst_perplotperyear %>% View()
write_csv(vst_mappingandtagging, here::here("data", "vst_mappingandtagging.csv"))
write_csv(vst_perplotperyear, here::here("data", "vst_perplotperyear.csv"))
Once we’ve saved our data files in the data folder, we can use functions in the dataspice package to create metadata files and complete them.
We’ll start by creating the basic metadata .csv files in which to collect metadata related to our example dataset using function dataspice::create_spice()
create_spice()
This creates a metadata folder in your project’s data folder (although you can specify a different directory if required) containing 4 files in which to record your metadata.
Let’s start with a quick and easy one, the creators. We can open and edit the file using in an interactive shiny app using dataspice::edit_creators(). Although we did not collect this data, just complete with your own details for the purposes of this tutorial.
edit_creators()
Remember to click on Save when you’re done editing.
Before manually completing any details we can use dataspice’s dedicated function prep_access() to extract information required for the access.csv
prep_access()
Again, we can use function edit_access() to complete the final details required, namely the URL at which each dataset can be downloaded from. Use the URL from we donloaded each data file in the first place (hint ☝️)
We can also edit details such as the name field to something more informative if required.
Remember to click on Save when you’re done editing.
edit_access()
http://data.neonscience.org/data-product-view?dpCode=DP1.10098.001
Before we start filling this table in, we can use some base R to extract some of the information we require. In particular we can use function range() to extract the temporal and spatial extents of our data.
range(vst_perplotperyear$date, vst_mappingandtagging$date)
## [1] "05/22/15" "11/18/15"
range(vst_perplotperyear$decimalLatitude)
## [1] 42.39229 44.06795
range(vst_perplotperyear$decimalLongitude)
## [1] -72.26573 -71.28145
Now that we’ve got the values for our temporal and spatial extents, we can complete the fields in the biblio.csv file. Additional information required to complete these fields can be found on the NEON data portal page for this dataset and the dataspice-tutorial repository README.
edit_biblio()
data_files <- list.files(here::here("data"),
pattern = ".csv",
full.names = TRUE)
data_files
## [1] "/Users/Anna/Documents/workflows/workshops/dataspice-tutorial/data/vst_mappingandtagging.csv"
## [2] "/Users/Anna/Documents/workflows/workshops/dataspice-tutorial/data/vst_perplotperyear.csv"
data_files %>% purrr::map(~prep_attributes(.x))
edit_attributes()
We can now use dataspice::edit_attributes() to fill in the final details, namely the description and units the
For dataspice, we have opted to use unit specification which can be parsed by R pkg units, a package which provides a class for maintaining unit metadata and functionality for checking compatibility and conversion. units is itself based on the UDUNITS-2 C library, a library for units of physical quantities and unit-definition and value-conversion utility.
You can install and search for units using units pkg, but for a quicker browsing and searching of units, you can use the handy “Units and Symbols Found in the UDUNITS2 Database” web app to identify prefixes and base unit definitions. For now, use the name rather than the symbol definition. More complex units can be definied arithmetically and by combining base units. (eg meter cubed could be specificied as m3 or m^3. See units vignette for more details).
Again, additional information required to complete these fields can be found on the NEON data portal page for this dataset and the dataspice-tutorial repository README
Now that all our metadata files are complete, we can compile it all into a structured dataspice.json file in our data/metadata/ folder.
write_spice()
## Parsed with column specification:
## cols(
## title = col_character(),
## description = col_character(),
## datePublished = col_date(format = ""),
## citation = col_character(),
## keywords = col_character(),
## license = col_character(),
## funder = col_character(),
## geographicDescription = col_character(),
## northBoundCoord = col_double(),
## eastBoundCoord = col_double(),
## southBoundCoord = col_double(),
## westBoundCoord = col_double(),
## wktString = col_character(),
## startDate = col_date(format = ""),
## endDate = col_date(format = "")
## )
## Parsed with column specification:
## cols(
## fileName = col_character(),
## variableName = col_character(),
## description = col_character(),
## unitText = col_character()
## )
## Parsed with column specification:
## cols(
## fileName = col_character(),
## name = col_character(),
## contentUrl = col_character(),
## fileFormat = col_character()
## )
## Parsed with column specification:
## cols(
## id = col_character(),
## givenName = col_character(),
## familyName = col_character(),
## affilitation = col_character(),
## email = col_character()
## )
Here’s an interactive view of the dataspice.json file we just created:
jsonlite::read_json(here::here("data", "metadata", "dataspice.json")) %>% listviewer::jsonedit()
Publishing this file on the web means it will be indexed by Google Datasets search! 😃 👍
Finally, we can use the dataspice.json file we just created to produce an informative README web page to include with our dataset for humans to enjoy!
We use function dataspice::build_site() which creates file index.html in the docs/ folder of your project (which it creates if it doesn’t already exist).
build_site()